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1. INTRODUCTION 

Xia and Tong have written a provocative and stim- 
ulating paper. Among the many topics raised in 
their paper, I would like in particular to endorse 
several of their postulates: 

1. All models are wrong. 

2. Observations are not error-free. 

3. Estimation needs to account for the above two 
issues. 

As described in the paper, suppose that we ob- 
serve a process {yt-t = 1,...} for which we have 
a model {xt(6) : t = 1, . . .} which depends upon an 
unknown parameter 9. Let F x (9) denote the joint 
distribution of the xt(6) process and F y the joint 
distribution of the observables. When we say that 
the model is wrong, we mean that there is no 9 such 
that F x {9) = F y . If we think of the distribution F y 
as a member of a large space of potential joint dis- 
tributions, then the set of joint distributions F x {9) 
constitutes a low-dimensional subspace of this larger 
space. While there is no true 9, we can define the 
pseudo-true 9 as the value which makes F x {&) as 
close as possible to F y . This requires specifying a dis- 
tance metric between the joint distributions 

d(9) = d(F x (9),F y ) 

and then we can define the best-fitting model F x {9) 
by selecting 9 to minimize d(9). The relevant ques- 
tion is then: what is the appropriate distance met- 
ric? 
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2. CATCH-ALL ESTIMATION 

Xia and Tong recommend what they call a "catch- 
all" approach, where the distance metric is a weigh- 
ted sum of squared fc-step forecast residuals. They 
show that in some situations this criterion allows 
consistent estimation of the parameters of the true 
latent process. Their Theorem C requires that the 
latent process is deterministic, but the result might 
hold more broadly. 

This can be illustrated in a very simple example 
of a latent AR(1) with additive measurement error. 
Suppose that the latent process is 

x t = 9x t -i + e t 

and the observed process is 

y t = x t + T] t , 

where et and r\t are independent white noise. In this 
case, it is well known that yt has an ARM A (1,1) 
representation 



(1) 



y t = 9y t -i + in - aut-i, 



where ut is white noise and < a < 1. 

Xia and Tong propose estimation based on fe-step 
forecast errors. The fe-step forecast equation for the 
observables is 



(2) 

where 



Vt-i+k = 9 y t -i + e t (k), 



e t (k) 



fc-l 

£ 
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3 {Ut+k-j-l - OiUt+k-j-2)- 



Xia and Tong's estimator is based on a weighted 
average of squared forecast errors. For simplicity, 
suppose all the weight is on the feth forecast error. 
The estimator is 



T 

\k} = argmin^(y t _i +fc 
t=i 
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which has the explicit solution 

/WT \ l/k 

hk]= y Ef.irf-i ) ■ 

We calculate that as n — > oo 

where c = ctal/Oay, al = Euf and = Ey\. 

Thus for any k, 8 is inconsistent as an estimator 
of 9. But as k gets large the discrepancy gets smaller, 
as (1 — c) 1 /^ —¥ 1 since c < 1. Thus as k —¥ oo 

(3) 6 {k} -> 0. 

This derivation assumed that the estimator is based 
on the kth forecast error, but it extends to the case 
of a weighted average. 

The convergence (3) is an extension of Xia and 
Tong's Theorem C. It shows that estimation by min- 
imizing the squared fc-step forecast residual is con- 
sistent for the parameter of the latent AR(1), as k 
is made large. 

One trouble with this approach is that the esti- 
mator is quite inefficient. We can calculate that 

Tvar (%})-(^fc) 

as k — > oo. This means that the variance of the Xia- 
Tong estimator is increasing in k (and unbounded). 
This is especially troubling since the parameters 
of (1) can be estimated by standard ARM A meth- 
ods. The implication is that while the catch-all ap- 
proach has some useful robustness properties, there 
is no reason to expect the estimator to be efficient. 

3. MEASUREMENT ERROR AND 
NONPARAMETRIC IDENTIFICATION 

Xia and Tong emphasize that measurement er- 
ror is empirically relevant and time series methods 
should take it seriously. While I agree, we also need 
to acknowledge that measurement error raises many 
troubling problems. Of primary importance, I be- 
lieve, is the vexing issue of nonparametric identifica- 
tion — whether the parameters of interest are unique- 
ly determined by the distribution of the observables. 
As is known from the random sampling context, 
measurement error complicates identification. In gen- 
eral, additional information or structure is required 



to identify the parameters of an unobserved latent 
process. It is not sufficient to simply introduce a new 
estimator. 

We can see this quite simply by examining the 
spectral density. Suppose as above that x t is the 
process of interest and the observed process is yt = 
x t + ilt where rjt is i.i.d. measurement error with vari- 
ance a 2 . Letting f x (\) and f y (X) be the spectral 
densities of xt and yt, we know that 

f y (X) = f x (X) + a 2 ri . 

The distribution of the observables yt identifies f y (X), 
but f x (X) is not identified from knowledge of f y (X) 
alone. Under the realistic assumption that a 2 is un- 
known, f x (X) can only be identified by knowledge of 
the structure of xt [e.g., by knowing that xt is an 
AR(1) as in the example of the previous section]. 
But if we acknowledge that our models for xt are 
misspecified, we should view the true f x (X) as non- 
parametric and hence without structure. It follows 
that the spectral density f x (X) is not nonparamet- 
rically identified, and thus neither is the autocorre- 
lation structure of xt- 

Nevertheless, some features are identified. While 
the spectral density is not point identified, it is in- 
terval identified. Let / = min^ f y {X). Observe that 

fy(X)-J<f x {X)<f y (X). 

The two bounds are identified from f y (X), so the 
spectral density f x (X) of xt can be bounded within 
this interval. The width of the interval is / = 
mmxfx(X) + cj^, which is thus an upper bound for 
the measurement error variance <x^. 

What is particularly interesting is that while the 
level of f x (X) is not identified, many of its most 
important features are identified, specifically, the 
peaks and troughs. What this means is that while 
full knowledge of the xt process is not possible, im- 
portant features can be identified from the distribu- 
tion of the observables yt- Knowledge of which fea- 
tures are identified in the presence of measurement 
error and/or misspecification helps focus attention 
on what can be learned about unobserved processes 
from observational data. 
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